Towards a comprehensive dataset of vocal imitations of drum sounds
نویسندگان
چکیده
The voice is a rich and powerful means of expressing acoustic concepts such as musical sounds. Recent research on vocal imitations has demonstrated the viability of using the voice to search for sounds, using query by vocalisation. Here we present the methods used to develop a dataset for evaluating the performance of query by vocalisation systems for drum sounds. The dataset consists of imitations of 30 drum samples from a commercial drum sample library, performed by 14 musicians with experience in computer based music production. The dataset includes participant ratings of their satisfaction with each imitation, and perceptual similarity ratings between each imitation and the sounds being imitated, collected via an online, MUSHRA style listening test. 1. REQUIREMENTS OF THE DATASET Searching for drum sounds is a core part of the electronic music making process [1], and the voice is an effective means of describing sounds [2]. Query by vocalisation (QBV) systems allow a user to search for a sound by vocalising an example of the desired sound [3,4]. This interaction modality presents an intuitive way for musicians, music producers and sound engineers to search for musical sounds using intelligent search methods. However, to build a QBV system that can retrieve perceptually relevant sounds, we require a model that maps between the sound spaces of the voice and a sample library, based on a priori knowledge of the perceptual similarity between vocal imitations and the samples in question. To design and build such a model, we require a dataset of prototypical vocal imitations that includes perceptual similarity ratings between the imitations and each of the sounds being imitated. The primary aim of this work is to develop such a dataset, specifically for drum samples, however the methods used here could also be applied to other types of sample libraries. 2. SELECTING THE STIMULI The drum samples were selected from the fxpansion1 BFD3 Core and 8BitKit sample libraries, with six samples taken from each of five drum classes (kicks, snares, hi-hats, toms, cymbals), giving thirty samples to be used as the stimuli. The samples for each class were selected as follows: first, a random seed sample was selected; next, the most and least 1https://www.fxpansion.com similar samples to the seed were selected, based on a withindrum-class similarity measure using auditory images [5]; finally, three samples equally spaced in distance between the closest and furthest samples were selected. This approach gives a range of six samples within a drum class that are representative of the variety of sounds in the sample library. The auditory image based similarity measure is an implementation of the best performing method in [5], which the authors found to be highly correlated with perceptual similarity ratings of within-class drum sounds for bass, snare and tom drum classes. In brief, this measures the distance between the spectrograms of two drum sounds after after the following pre-processing: length is matched by zero padding the shorter spectrogram; loudness (in dB) is scaled using Terhardt’s ear model [6]; frequency scaled using the Bark scale. 3. RECORDING THE IMITATIONS Fourteen participants recorded their vocal imitations of the thirty extracted samples. The recording workflow is shown in Figure 1. The participants could practise and re-record the imitations of each stimulus as many times as they wished. After recording each imitation, the participants gave satisfaction ratings for their imitations on a five point Likert scale from completely dissatisfied to completely satisfied. Proce dings of the 2 AES Workshop on I telligent Music Production, Lond , UK, 13 Septemb r 2016 TOWARDS A COMPREHENSI I ITATIONS OF DRUM SOUNDS Adib ehrabi, Si on ixon and ark Sandler Centre for Digital usic Queen Mary University of London {a.mehrabi, s.e.dixon, mark.sandler}@qmul.ac.uk
منابع مشابه
Similarity measures for vocal-based drum sample retrieval using deep convolutional auto-encoders
The expressive nature of the voice provides a powerful medium for communicating sonic ideas, motivating recent research on methods for query by vocalisation. Meanwhile, deep learning methods have demonstrated state-of-the-art results for matching vocal imitations to imitated sounds, yet little is known about how well learned features represent the perceptual similarity between vocalisations and...
متن کاملVocal imitations of basic auditory features.
Describing complex sounds with words is a difficult task. In fact, previous studies have shown that vocal imitations of sounds are more effective than verbal descriptions [Lemaitre and Rocchesso (2014). J. Acoust. Soc. Am. 135, 862-873]. The current study investigated how vocal imitations of sounds enable their recognition by studying how two expert and two lay participants reproduced four basi...
متن کاملVocal Imitations of Non-Vocal Sounds
Imitative behaviors are widespread in humans, in particular whenever two persons communicate and interact. Several tokens of spoken languages (onomatopoeias, ideophones, and phonesthemes) also display different degrees of iconicity between the sound of a word and what it refers to. Thus, it probably comes at no surprise that human speakers use a lot of imitative vocalizations and gestures when ...
متن کاملOn the effectiveness of vocal imitations and verbal descriptions of sounds.
Describing unidentified sounds with words is a frustrating task and vocally imitating them is often a convenient way to address the issue. This article reports on a study that compared the effectiveness of vocal imitations and verbalizations to communicate different referent sounds. The stimuli included mechanical and synthesized sounds and were selected on the basis of participants' confidence...
متن کاملFree Classification of Vocal Imitations of Everyday Sounds
Þ Goal: Þ Analyze a free classification of vocal imitations of everyday sounds. Þ Highlight the acoustic properties of the imitated sound sources. Þ Motivations: Þ Vocal imitations are simple but still allow the recognition of the imitated sources. Þ They provide a paradigm for studying the acoustic invariants of the sounds. Þ Contributions: Þ Analysis of the individual strategies with the RV c...
متن کامل